Overview

This report summarizes the results of the consensus clustering module, which utilizes resampling-based consensus clustering (Monti et al, 2003) to derive robust proteome clusters. 1000 bootstrap sample data sets are clustered into K clusters using k-means, and a consensus matrix is constructed whose entries (i, j) record the number of times items i and j are assigned to the same cluster divided by the total number of times both items are selected. A range of possible cluster numbers K between 2 and 10 are evaluated and the best K is determined by comparing the empirical cumulative distribution (CDF) of the resulting consensus matrices. To compare the clusterings, the increase of CDF area Kdelta is evaluated and the K with the largest Kdelta is defined as best K. Detailed documentation for the consensus clustering module can be found here. This report shows metrics comparing the clusterings used to determine the best K, the consensus matrix for the best K, principal component analysis for the best K clusters, and marker selection & GSEA results for each cluster.

Clustering metrics

Figure: Interactive plots of clustering metrics for K = 2 through K = 10. The best K is highlighted in red for each metric. Hover over each point to see the score value it corresponds to.

Table: Best K and score for different clustering metrics.

Results for best K = 3

Consensus matrix for K = 3

Figure: Consensus matrix for best K = 3.

PCA plot for best K = 3

Figure: Plot of principal component analysis for K = 3. Clusters are separated by colors and shapes. Sample names are labeled for each point.

Marker selection results for K = 3

Figure: Heatmap showing marker selection results for K = 3 clusters. Columns (samples) are sorted by cluster number and rows (features) are clustered by hierarchical clustering using the Pearson correlation method. Sample annotations, including cluster number, are labeled at the top of the heatmap.

GSEA results for Cluster 1

Figure: Interactive volcano plot summarizing GSEA pathway results for cluster 1. X axis represents the Normalized Enrichment Score (NES); negative NES values indicate enrichment in cluster 1, and positive NES values indicate enrichment in the rest of the clusters. Y axis represents the -log10 of the FDR value. Results above the dashed line are significant at FDR cutoff = 0.01. Hover over each point to see which pathway and category it corresponds to.

No significantly enriched pathways in cluster 1 with FDR < 0.01.

GSEA results for Cluster 2

Figure: Interactive volcano plot summarizing GSEA pathway results for cluster 2. X axis represents the Normalized Enrichment Score (NES); negative NES values indicate enrichment in cluster 2, and positive NES values indicate enrichment in the rest of the clusters. Y axis represents the -log10 of the FDR value. Results above the dashed line are significant at FDR cutoff = 0.01. Hover over each point to see which pathway and category it corresponds to.

No significantly enriched pathways in cluster 2 with FDR < 0.01.

GSEA results for Cluster 3

Figure: Interactive volcano plot summarizing GSEA pathway results for cluster 3. X axis represents the Normalized Enrichment Score (NES); negative NES values indicate enrichment in cluster 3, and positive NES values indicate enrichment in the rest of the clusters. Y axis represents the -log10 of the FDR value. Results above the dashed line are significant at FDR cutoff = 0.01. Hover over each point to see which pathway and category it corresponds to.

No significantly enriched pathways in cluster 3 with FDR < 0.01.